Conversation
Contributor
There was a problem hiding this comment.
AMDGPU.jl Benchmarks
Details
| Benchmark suite | Current: bfc256a | Previous: 756602c | Ratio |
|---|---|---|---|
amdgpu/synchronization/context/device |
610 ns |
600 ns |
1.02 |
amdgpu/synchronization/stream/blocking |
250 ns |
240 ns |
1.04 |
amdgpu/synchronization/stream/nonblocking |
340 ns |
340 ns |
1 |
array/accumulate/Float32/1d |
86361 ns |
86251 ns |
1.00 |
array/accumulate/Float32/dims=1 |
397256 ns |
393845 ns |
1.01 |
array/accumulate/Float32/dims=1L |
135712 ns |
131681 ns |
1.03 |
array/accumulate/Float32/dims=2 |
133462 ns |
103022 ns |
1.30 |
array/accumulate/Float32/dims=2L |
2805579 ns |
2827930 ns |
0.99 |
array/accumulate/Int64/1d |
96171 ns |
96412 ns |
1.00 |
array/accumulate/Int64/dims=1 |
407356 ns |
285244 ns |
1.43 |
array/accumulate/Int64/dims=1L |
167053 ns |
160812 ns |
1.04 |
array/accumulate/Int64/dims=2 |
126791 ns |
120772 ns |
1.05 |
array/accumulate/Int64/dims=2L |
2987371 ns |
3014433 ns |
0.99 |
array/broadcast |
93662 ns |
128932 ns |
0.73 |
array/construct |
1590 ns |
1680 ns |
0.95 |
array/copy |
37621 ns |
39371 ns |
0.96 |
array/copyto!/cpu_to_gpu |
184263 ns |
114832 ns |
1.60 |
array/copyto!/gpu_to_cpu |
183652 ns |
152432 ns |
1.20 |
array/copyto!/gpu_to_gpu |
126892 ns |
88321 ns |
1.44 |
array/iteration/findall/bool |
179892 ns |
181912 ns |
0.99 |
array/iteration/findall/int |
187303 ns |
190933 ns |
0.98 |
array/iteration/findfirst/bool |
123721 ns |
114451 ns |
1.08 |
array/iteration/findfirst/int |
118372 ns |
116331 ns |
1.02 |
array/iteration/findmin/1d |
166743 ns |
166203 ns |
1.00 |
array/iteration/findmin/2d |
155752 ns |
156173 ns |
1.00 |
array/iteration/logical |
348885 ns |
346025 ns |
1.01 |
array/iteration/scalar |
288354 ns |
289864 ns |
0.99 |
array/permutedims/2d |
74901 ns |
64761 ns |
1.16 |
array/permutedims/3d |
74231 ns |
73791 ns |
1.01 |
array/permutedims/4d |
76831 ns |
76481 ns |
1.00 |
array/random/rand/Float32 |
50981 ns |
51540 ns |
0.99 |
array/random/rand/Int64 |
57291 ns |
56210 ns |
1.02 |
array/random/rand!/Float32 |
92261 ns |
142162 ns |
0.65 |
array/random/rand!/Int64 |
116052 ns |
141832 ns |
0.82 |
array/random/randn/Float32 |
100182 ns |
86921 ns |
1.15 |
array/random/randn!/Float32 |
116242 ns |
152202 ns |
0.76 |
array/reductions/mapreduce/Float32/1d |
132841 ns |
132902 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=1 |
95592 ns |
95052 ns |
1.01 |
array/reductions/mapreduce/Float32/dims=1L |
772881 ns |
777081 ns |
0.99 |
array/reductions/mapreduce/Float32/dims=2 |
96871 ns |
96731 ns |
1.00 |
array/reductions/mapreduce/Float32/dims=2L |
298774 ns |
299584 ns |
1.00 |
array/reductions/mapreduce/Int64/1d |
134062 ns |
133322 ns |
1.01 |
array/reductions/mapreduce/Int64/dims=1 |
95462 ns |
78081 ns |
1.22 |
array/reductions/mapreduce/Int64/dims=1L |
783781 ns |
783471 ns |
1.00 |
array/reductions/mapreduce/Int64/dims=2 |
94881 ns |
96252 ns |
0.99 |
array/reductions/mapreduce/Int64/dims=2L |
300434 ns |
308254 ns |
0.97 |
array/reductions/reduce/Float32/1d |
132662 ns |
132802 ns |
1.00 |
array/reductions/reduce/Float32/dims=1 |
95001 ns |
94832 ns |
1.00 |
array/reductions/reduce/Float32/dims=1L |
773371 ns |
774621 ns |
1.00 |
array/reductions/reduce/Float32/dims=2 |
97222 ns |
96802 ns |
1.00 |
array/reductions/reduce/Float32/dims=2L |
297484 ns |
307245 ns |
0.97 |
array/reductions/reduce/Int64/1d |
133152 ns |
129672 ns |
1.03 |
array/reductions/reduce/Int64/dims=1 |
94852 ns |
78151 ns |
1.21 |
array/reductions/reduce/Int64/dims=1L |
780821 ns |
781931 ns |
1.00 |
array/reductions/reduce/Int64/dims=2 |
96111 ns |
96192 ns |
1.00 |
array/reductions/reduce/Int64/dims=2L |
303334 ns |
298414 ns |
1.02 |
array/reverse/1d |
43921 ns |
44380 ns |
0.99 |
array/reverse/1dL |
75401 ns |
74131 ns |
1.02 |
array/reverse/1dL_inplace |
112692 ns |
108282 ns |
1.04 |
array/reverse/1d_inplace |
77891 ns |
86471 ns |
0.90 |
array/reverse/2d |
52211 ns |
50661 ns |
1.03 |
array/reverse/2dL |
101491 ns |
100341 ns |
1.01 |
array/reverse/2dL_inplace |
130102 ns |
117622 ns |
1.11 |
array/reverse/2d_inplace |
79911 ns |
95391 ns |
0.84 |
array/sorting/1d |
341775 ns |
341945 ns |
1.00 |
integration/byval/reference |
39130 ns |
38830 ns |
1.01 |
integration/byval/slices=1 |
40161 ns |
40880 ns |
0.98 |
integration/byval/slices=2 |
140522 ns |
158462 ns |
0.89 |
integration/byval/slices=3 |
237983 ns |
238013 ns |
1.00 |
integration/volumerhs |
5050459 ns |
4942659 ns |
1.02 |
kernel/indexing |
129511 ns |
43630 ns |
2.97 |
kernel/indexing_checked |
124942 ns |
128022 ns |
0.98 |
kernel/launch |
1310 ns |
1290 ns |
1.02 |
kernel/rand |
123941 ns |
106671 ns |
1.16 |
latency/import |
1493016620 ns |
1501349912 ns |
0.99 |
latency/precompile |
11942427457 ns |
12041117438 ns |
0.99 |
latency/ttfp |
10901786827 ns |
10491950084 ns |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
Member
Author
|
cscs-ci run |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix
rocfft_status_failurewhen executing FFT on a spawned Julia task (Julia 1.12+)In Julia 1.12,
Threads.@spawnseems to inherits task-local storage (viaBase.copy?), so the spawned task sees the sameHIPStreamas the parent.update_stream!was guarded byplan.stream != new_stream, causing it to skiprocfft_execution_info_set_stream, however rocFFT requires this call to be made on the same OS thread asrocfft_execute. With multiple Julia threads the spawned task can land on a different OS thread, causingrocfft_status_failure.The fix is to call
rocfft_execution_info_set_streamunconditionally before every execution, dropping theplan.stream != new_streamguard. This is unobservable in a (single-threaded) REPL (all tasks share one OS thread?) but it manifests sometimes in CI.